Skip to content

Conversation

klensy
Copy link
Contributor

@klensy klensy commented Jul 1, 2022

Avoids bloating size of some rmeta\rlib files by not placing default string for StabilityLevel::Unstable reason multiple times, affects only stdlib\rustc artifacts. For stdlib cuts about 3% (diff of total size for patched\unpatched *.rmeta files of stage1-std) of file size, depending on crates.

fixes #88180

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 1, 2022
@rust-highfive
Copy link
Contributor

r? @lcnr

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 1, 2022
@klensy
Copy link
Contributor Author

klensy commented Jul 1, 2022

Or maybe encode with something like this, but values deduped?

/// A list of lazily-decoded values, with the added capability of random access.
///
/// Random-access table (i.e. offering constant-time `get`/`set`), similar to
/// `LazyArray<T>`, but without requiring encoding or decoding all the values
/// eagerly and in-order.
struct LazyTable<I, T> {
position: NonZeroUsize,
encoded_size: usize,
_marker: PhantomData<fn(I) -> T>,
}

@cjgillot
Copy link
Contributor

cjgillot commented Jul 2, 2022

There is precedent for such encoding, for instance type_shorthands or ExpnIndex.
Do you have an estimate of the amount of duplicated strings?

@klensy
Copy link
Contributor Author

klensy commented Jul 2, 2022

There is precedent for such encoding, for instance type_shorthands or ExpnIndex. Do you have an estimate of the amount of duplicated strings?

from #88180 when issue was created:

found 172 times in libaddr2line.rmeta
found 911 times in libhashbrown.rmeta
found 9590 times in libobject.rmeta

@klensy
Copy link
Contributor Author

klensy commented Jul 2, 2022

Checked now, for stage0 compiler artifact libobject.rmeta: 11237 times, for example.

@cjgillot
Copy link
Contributor

cjgillot commented Jul 2, 2022

Why not the following:

  • encode the integer value of Symbol;
  • encode somewhere else a FxHashMap<u32, String> of only actually encoded Symbols.

This would maximize the deduplication while being self-contained.
If we don't care about being self-contained, we can skip encoding the strings for static Symbols (ie. those listed in rustc_span::symbol).

What do you think?

@klensy
Copy link
Contributor Author

klensy commented Jul 2, 2022

impl<S: Encoder> Encodable<S> for Symbol {
fn encode(&self, s: &mut S) {
s.emit_str(self.as_str());
}
}
impl<D: Decoder> Decodable<D> for Symbol {
#[inline]
fn decode(d: &mut D) -> Symbol {
Symbol::intern(&d.read_str())
}
}

Something like replacing s.emit_str(self.as_str()); with with s.emit_u32(self.as_u32()); + saving in hash table, and in decode reading from that hash table? Only left to find the place where to save that hash table.

@cjgillot
Copy link
Contributor

cjgillot commented Jul 2, 2022

To generate those hash-tables, you can use specialization like we do for impl {De,En}codable for Span. Then, the hash-tables can be stored as any other in CrateRoot. We may want to only decode it once.

@cjgillot cjgillot self-assigned this Jul 3, 2022
@lcnr
Copy link
Contributor

lcnr commented Jul 4, 2022

r? @cjgillot

@cjgillot
Copy link
Contributor

@klensy do you still want to merge this, or should it be closed in favour of #98851?

@cjgillot cjgillot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 11, 2022
@bors
Copy link
Collaborator

bors commented Jul 14, 2022

☔ The latest upstream changes (presumably #98975) made this pull request unmergeable. Please resolve the merge conflicts.

@klensy
Copy link
Contributor Author

klensy commented Jul 19, 2022

I think this should be merged (and #98851 tested after that), as current default error message looks more like lint warn\error (that can possibly be translatable in future) and shouldn't be embed into compiled artifacts, will rebase.

@klensy klensy force-pushed the no-string-dupes-ugly branch from 1cc712d to b38c948 Compare July 21, 2022 20:13
@klensy
Copy link
Contributor Author

klensy commented Jul 24, 2022

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 24, 2022
@cjgillot
Copy link
Contributor

@bors r+ rollup=never
^ may have perf implications

@bors
Copy link
Collaborator

bors commented Jul 24, 2022

📌 Commit b38c948 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022
@bors
Copy link
Collaborator

bors commented Jul 25, 2022

⌛ Testing commit b38c948 with merge 7f93d4a...

@bors
Copy link
Collaborator

bors commented Jul 25, 2022

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing 7f93d4a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 25, 2022
@bors bors merged commit 7f93d4a into rust-lang:master Jul 25, 2022
@rustbot rustbot added this to the 1.64.0 milestone Jul 25, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7f93d4a): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
0.3% 0.3% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.4% 2.4% 1
Improvements 🎉
(primary)
-1.3% -1.6% 6
Improvements 🎉
(secondary)
-2.0% -3.8% 7
All 😿🎉 (primary) -1.3% -1.6% 6

Cycles

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
2.2% 2.2% 1
Regressions 😿
(secondary)
2.9% 3.6% 5
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 2.2% 2.2% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustc_attr::Stability inlined a lot of dublicates in rustc dependency crates rmeta's, increasing it's size
7 participants